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ABSTRACT 

A study explored the written compositions of 
elementary school students (ages 6-14) and the vocabulary they use. 
Compositions were written by a large national sample of over 4,000 
children, who were given free rein to write whatever they wanted; 
thus the study provides status information on the vocabulary that 
children currently use. The study also explored relationships between 
vocabulary and the communicativeness of the compositions and between 
vocabulary and reading comprehension. A major outcome of the study 
was the compilation of a lexicon of children's written vocabulary. 
That word list shows huge cultural changes in the vocabulary that 
children use when compared to word lists (still in common use today) 
which are 40 to 60 years old. Results also included analyses of token 
profiuction, type production, type-token relationships, and spelling 
error production. (Two tables of data and six figures are included, 
and 107 references are attached. Two appendixes contain a list of the 
500 most frequent words, and a list of common misspellings.) (SR) 
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Preface 



The monograph series in Language Education was established to publish rer 
search and theoretical statements that cannot be accommodated in the limited 
space provided by the typical education joumaL Previous monographs have 
treated secondary reading, adult reading habits^ perspectives on comprehen- 
s!on» and reading English as a second language. This present monograph 
pushes into the little-explored area of the written compositions of elementary 
school child'^en and the vocabulary children use. 

This study by Smith and Ingersoli pft>vides status information on the vocabulary 
that children use when they are given free rein to write whatever thay want. 
Relationships between vocabulary and the communicalivenoss of Ihe com- 
positions and with reading comprehension are also explored. A huge change 
has taken place in the written vocabulary of children over the past forty years. 
Smith and Ingersoli make quantitative comparisons between their list and those 
of Rinsland and others. In view of the stock placed in several word lists that 
are forty to sixty years old, cum'culum developers and teachers can gain val- 
uable insights for their wori< from the results reported in this study. 

—Leo C, Fay 




Introduction 



In his assessment of writing resaarch. Graves (1978, 1981) was forced to 
conclude that research and curricular development in children's writing hss not 
been a major focus in American edL:jtion. Efforts in language arts research 
and development are overwhelmingly aimed at reading. The wide disparity in 
effort is reflected in Graves's observation that for each $1 spent on teaching 
children to communicate through writing, schools spend nearly $3000 on teach- 
ing them to decode written communication— that is, to read. In one recent 
conference, Whitman (1981) went co far as to conclude that writing research 
lags 50 to 100 years behind reading research. The result of this disparity in 
effort has been a general paucity of knowledge about productive language 
behavior as seen in children's writing. 

In many ways the lack of research in writing is curious since the link between 
children's productive vocabulary, language skills, and reading comprehension 
has been firmly established. One purpose of the research reported in this 
monograph is to respond io this void through a study of a large national sample 
of compositions written by children in grades^ 1 through 8, ages 6-14. Several 
products result from this study, not all of which are included herein. A major 
outcome of this research has been the compilation of a lexicon of children's 
written vocabulary. That word list, in turn, has been the subject of a series of 
quanlitaiive and qualitative analyses. Additionally, the compositions themselves 
have been the subj^ of a variety of studies. The breadth of the sample of 
compositions available for analysis in this study is so wide that it will be several 
years before all research hypotheses are exhausted. For this report, attention 
is concentrated on the vocabulary production in these compositions and a 
limited set of relationships. 

Ccmposition Research 

The vast majority of literature on composition deals with classroom tech- 
niques, lesson ideas, and motivation schemes. While these are interesting and 
useful for the practitioner, they do not address the critical issue of determining 
if, in fact, composition skills are being learned and refined over time. 

For a historical perspective on the research in composition we recommend 
a report by VanDeWeghe (1978) in which he examines fhe research response 
to the Braddock Report (1963) which posed 24 research questions about written 
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composition. Anotlier liistorical view is gained from Smith (1967) wtio outlined 
research and evaluation instruments dating from 1912. Those sources indicate 
a long-standing 'Interest in holistic rating scales and, more recently, in the 
measurement of mechanistic aspects of composition, especially syntax. 

Sirce research often depends on evaluation techniques, perhaps the relative 
dearth of research in composition can be explained by the two major criticisms 
of composition testing: lack of test instruments and inadequate and artificial 
test environment. In Euros' (1975) English Tests and Reviews, there are no 
normed overall composition tests targeted. Standardized tests deal instead with 
many disaete aspects of writing, such as grammar, punctuation, vocabulary, 
and spelling. While researchers agree that these factors certainly are highly 
correlated with the quality of writing, they do not, in and of themselves, measure 
what is and is not good composition. 

Tests of writing mechanics as measures of quality of writing come under fire 
frequently in the literature. Newkirk (1977) lists five major criticisms of these 
writing tests: 

—Many have little actual writing. They depend on multiple choice or fill-in 
responses. 

—Most have questionable content validity. They do not establish that they really 
test what they purport to test. 

—Most provide inadequate time for students to produce their best efforts. 

—Most provide inadequate motivation for students to perform well. 

—Most incorporate improper, in-esponsible, or inaccurate methods of inter- 
preting and using the results that are offered. 

McCleary (1979) examines the topics of composition tests. Nowhere in the 
research does he find it demonstrated that the subjects about which the students 
are asked to write are reliable or valid. He hypothesizes that the student's 
experience with the topic, interest in the topic, and the manner in which the 
topic is introduced and explained will have profound Influence upon the child's 
ability to perform. 

Perron (1976) substantiates this view with a series of studies that show how 
the mode of discourse affects the quality' of student writing as judged by syntactic 
maturity. He examined 52 students in each of the third, fourth, and fifth grades 
and found significant differences in their writing in the various modes: argu- 
mentation, exposition, narration, and description. Specifically he discovered that 
argumentation topics rendered the most complex and mature syntax and that 
high-abiiity students perfomied better on these modes Jhan other students. 
Descriptive writing tended to be the least complex with exposition and narration 
in the middle. Pent)n cites many other studies ttiat similarly found that an 
individual student will perform quite differently depending on the kind of writing 
task required. Lloyd-Jones (1977) also found that good writing in one mode 
does not necessarily mean equal success in another mode. 

One should not, on the other hand, be overly ready to discard mechanistic 
variables in evaluating compositions. While such variables may lack the artistic 
appeal of more qualitative soproaches, they provide a reliable base on which 
to make inference. Qualitative ratings, without a common base, run the risk of 
low reliability. 
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Spelling 

Spelling skills among children continue to baffle educators. No one is sure 
exactly why some people seem to spell effortlessly and others resist all attempts 
to improve. In analyzing the mechanics of the spelling task, Simon (1976) noted 
that a two-part process is apparently at work. A child searches for a word in 
long-term memory and, failing to find it, generalizes a graphemic pattern from 
the options for that phoneme-H.e., sounds it out. She further classifies all 
spelling errors into four categories: 

1. En-ors of perception 

—from idiosyncratic mispronunciations, e.g., ''warter" for "water" 

— homophones(e.g., "to" and "too") 

— unknown words which are attempted totally phonetically 

2. Enters of generation 
—mis-application of phonetic patterns 

—poor selection of ambiguous phonetic patterns 
—mis-application of spelling rules and exceptions to spelling rules 

3. Errors of production 
—handwriting errors, e.g., "b" for "d" 

4. Errors of checking 

—"silly" errors which do not reflect lack of knowledge 
—emissions, transpositions, substitutions 

Others have gone a step further in attempts to explain why some people are 
better spellers than others. Nicholson and Schacter (1979) have identified three 
kinds of knowledge present in good spellers. First, good spellers tend to have 
a strong sense of language — a sense of what is right and wrong. Good spellers 
have a system of internalized rules about spelling. Some of these may have 
been taught in schools, and some come from experience with language and 
mental deduction of English spelling patterns. These children, therefore, can 
predict how to spell words. The third knowledge good spellers possess is visual 
memory. Because of their keen system of internalized rules about spelling, 
good spellers can identify those words that violate the rules and simply fearn 
to memorize the unpredictable words. 

Walker (1974) studied the visual memory aspect of spelling and concluded 
it to be the single most significant variable distinguishing good and poor spellers. 
One hundred forty-six students were given the Mental Imagery Test, This test 
called subjects to remember geometric images for certain periods of time and 
bring them to mind upon demand. The same subjects were then given 108 
"demon" spelling words to write. En-ors in spelling were classified as "P" (due 
to faulty pronunciation or inappropriate phonic generaHzation) or "V" (due to 
unexplained causes). The "V" errors were assumed to be on words that did 
not follow any rule pattern and, therefore, had to be memorized. The results 
indicated that females made significantly fewer errors of both Types and that 
good visualizers of both sexes on the Mental Imagery Test made significantly 
fewer "V" type errors. Walker proposes that very good spellers may possess 
a genetic gift for remembering graphic symbols. Once again, it seems that it is 
helpful to choose your parents carefully, hardly a helpful learning principle. 

Other studies indicate that students internalize an efficient system of personal 
spelling rules. The developmental aspect of spelling was studied in three pieces 



of research. By showing how spelling changes over time, Beers and Henderson 
(1977) confirmed the notion that children build an internalized system of rules 
that are sophisticated and become effective very early. Early first-grade at- 
tempts at spelling rely on a letter-name strategy. Later refined vowel knowledge 
and orthographic pattern knowledge begins to improve spelling before any 
formal rules instruction takes place. It would appear that exposure to language 
Improves spelling. Schwartz and Doehring (1977) lend fuslher credence to this 
theory- They looked at 20 good and 20 poor spellers in each of grades 2 through 
5. The children were asked to spell nonsense words which would test their 
knowledge of orthographic patterns. Clearly it was demonstrated that spelling 
is developmental in nature; i.e.. older spellers performed better even "before 
the beginning of formal spelling instruction." Further, it was shown that . . 
poor spellers lagged behind the good spellers in pattem acquisition by about 
two years." 

Zutell (1978) also found that children learn to spell by themselves through 
exposure. His work indicates that childrerLform letter strategies very early in 
their school career and often over-generalize their internalized rules until re- 
peated exposure and instructional feedback refine their spelling toward con- 
ventional usage. 

Tempteton's (1979) work coroborates Zutelfs and shows that children . . do 
not always expect a one-symbol one sound correspondence." 

Among the most-blamed causes of poor s^.. ,mg has been faulty pronunci- 
ation or dialectic influences on oral language. Groff (1973, 1978) discusses and 
rejects much of the previous rei?earch in dialect, influences— particularly black 
inner-city— on spelling. He claims that mo'^-t of the good research in this area 
was done with very young subjects. Becauee it is established that young spellers 
of all dialects rely on a sound-symbol spelling method until exposure refines 
their internalized pattems, Groff believes that the influences of dialect are less- 
ened over time. His studies did, in fact, indicate that by the middle and upper 
elementary grades dialect-related influences in spelling errors were essentially 
gone and that over time dialect does not Interfere with good spelling. 

Vocabulary 

A dozen or so lists of children's vocabulary are currently available for use by 
teachers and publishers (e.g., Thorndike, Rinsland. Gates, Dolch, etc.). Most 
of the lists were constructed many years ago, and because they are outdated 
have been criticized. Common criticisms suggest that toda/s child— through 
travel, television, and tradebooks— is a more sophisticated creature than his or 
her 1940*s counterpart. The older lists do not reflect the many new words that 
are commonly part of toda/s school age vocabulary. 

Johnson and Majere (1977) revised the Dolch list and published a new basic 
sight word list of 306 words they feel are most commonly used in toda/s reading 
series. Rhode (1977) added a new dimension to her list; by offering many new 
phrases and technological terms that she believes are part of oral vocabularies 
of children and should, therefore, be in reading vocabularies. Her modern list 
includes such tenms "cheeseburger" and "pantsuit." 

The construction of vocabularies has been consistent from the early lists to 
the current ones. Vocabulary items are selected by looking at children's liter- 
ature: basal reading series, trade books, dictionaries, and classroom materials 
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of various kinds. Usually with the aid of computers, these words are sorted and 
edited according to frequency of use. Some lists further delineate the words 
by the age at which they are typically added to a child's vocabulary. 

Word Lists 

Historically, word counts and word lists can be traced to the tenth and fifteenth 
centuries (de Rocher, f\^iron anc' Patton, 1973). Those eariy attempts were 
predominately biblical concordances. However, by 1721, Nathaniel Bailey had 
compiled an extensive listing of the English vocabulary which was to serve as 
the basis of a dictionary. At the turn of the nineteenth century, the Rev. J. 
Knowtes listed 350 most frequently used words from selected literature. Again, 
however, Knowles* primary source was the Sibie and biblical literature. 

Early word lists were typically used to specify appropriate spelling lists for 
school children. Chancellor (1910), for example, compiled a list of some 20,000 
words :rom dictionaries and spellers. He then identified a set of 1,000 words 
wh'ch he deemed most important to teaching spelling to school children. Chan- 
cellor did not, however, provide any empirical base for his decisions or any 
documentation to justify his choices. 

An eariy mode of selecting words was the use of adult writing samples. Ayres 
(1913) found 2,001 different words used in personal and business letters that 
he then proposed to serve as a base for spelling instruction. How he selected 
those letters, however, is unclear. 

In another study using con^espondence, Andersen (1921) asked his students 
to bring in personal letters received by parents and friends. He then compiled 
a corpus of 9.223 words, of which 3.^^17 had a frequency of one. His entire 
corpus yielded only 126 different misspelled words. Quite cleariy this sampling 
procedure would lead to a biased sample since the letters produced by the 
students were most likely unimaginative and certainly not terribly personal. 
Furthermore, spelling errors would likely be minimized due to personal editing 
prior to mailing. 

In another study, Clari<e (1921) analyzed letters to the editor that appeared 
in Chicago newspapers. The resulting corpus yielded a total of 28,292 running 
words (Tokens) and 3,360 different words (Types) from 2,000 such letters. A 
large number of words that appeared in the Clarke list were missing C.-om the 
Ayres list and vice versa. Both lists reflect biased sources. 

The landmark research of Thorndike (1921; Thorndike and Lorge, 1944) 
dominated the study of reading vocabulary for years and became the keystone 
upon which later studies were built. Thorndike and Lorge compiled word lists 
based on children's and adults' reading material. The resulting volumes served 
as the foundation for vocabulary in reading series and for experimental materials 
in research on verbal processes for the next fifty years. 

The more recent compilation of vocabulary based on children's literature by 
Carroll and his associates (Carroll, Davies and Richman, 1971) and the analysis 
of the Brown University corpus of 1 ,000,000 words (an adult corpus) by Kucera 
and Francis (1967) moved the study of vocabulary lists forward substantially. 

The Carroll, Davies, and Richman (1971) compilation of the "Word Frequency 
Book" was initiated to identify an appropriate corpus of words for the American 
Heritage School Dictioriary. Those researchers identified target reading ma- 
terials through a survey sent to schools across the nation. Those target materials 
were then sorted into 22 subject areas, and 200 word samples were taken from 
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each. The result was a computer-assembled listing of 5,088,721 Tokens (total 
words) which were sorted into 86,741 Types (different words). Most analyses 
performed by Canroll, Davies and Richman followed Herdan's (I960) lognormal 
model. However, like the Thorndike and Lorge word counts, the materials reflect 
written material that children may encounter rather than that which they produce, 

A particularly cogent study from the vantage point of the present research, 
was conducted by Rinsland (1945)* Rinsland compiled an extensive set of 
children's compositions. In 1 936, under funding from the Wori<s Projects Admin- 
istration, Rinsland had each word from every composition copied onto a file 
card. These file cards were then sorted and tabulated. The result was the first 
major analysis of children's written vocabulary. In Rinsland's study, slightly more 
than 100.000 children's writing samples were analy^^ed. The result of this effort 
was tiie accumulation of 6,012,359 running words and 25,632 different words. 
Of the 25,632 different words, 11,061, or 43.2 percent, had frequencies of one 
or two and were not in the published volume. Still, as with other lexicons, the 
large number of words with a frequency of 1 or 2 accounted for a small portion 
of the total running words. 

In a more recent analysis of children's written vocabulary, Hillerich (1978) 
generated a list of words based on the compositions of children in one small 
Illinois school system. Hillerich's list suffers, however, from at least three per- 
spectives. First, the failure to sample from a more heterogeneous population, 
both geographically and demographically, limits the generality of his' findings. 
While this criticism may be leveled at neariy any study since true random 
selection is unfeasible, the problem is particularly relevant when samples of 
geographic convenience are used. Second, Hillerich encouraged teacher cor- 
rections and editing of the children's compositions prior to entering the data for 
analyses. The degree of contamination of results from such interference is 
immeasurable. Without question, no worthwhile generalizations regarding spell- 
ing error patterns would have been possible fronr* &uch a study. Third, Hillerich 
provides no analyses of the list of the children's compositions beyond some 
superficial summary statistics. 

VOCABULARY AND READING COMPREHENSION 

There is genera* <icceptance that vocabulary facilitates comprehension (Davis, 
1944; Ruddel, 1969), that is, the understanding of the sentence or paragraph 
is enhanced by the knowledge ^f component words. In other words, vocabulary 
development and language co.nprehension are components of the same topic 
(MaGinitie. 1976). By teaching the skills necessary to detentiine word meaning, 
a direct improvement in comprehending related content can be realized (Yap, 
1979), 

A number of studies that verified the impact of vocabulary instruction on 
comprehension yielded improved comprehension at the sentence level, (Ahl- 
fors, 1979;Blanchard, 1979; Jenkins, 1978), These studies provided vocabulary 
instrucfion of some sort and tested its impact on standardized tests of reading 
achievement in Mmprehension, A study by Thompson (1973) produced positive 
effects in both vocabulary and comprehension after a short period of vocabulary 
instrucfion. 

A number of studies point to the finding that vocabulary building is in fact 
concept development. These studies involve some instructional activity in con- 
cept development, or some experience related to v ^ords that links them to ideas 
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or concepts (Kaplan & Tuchman. 1980). The result of such efforts shows en- 
hanced concept retention and concurrent vocabulary development. Lieberman 
(1965) demonstrated this finding at the ele'mentary level, and Dea (1978) and 
Kessler (1976) showed the increase in concept learnings in secondary history 
and science classrooms respectively. Humes (1977) demonstrated the effec- 
tiveness of using concept-leaming techniques to teach sophisticated vocabulary 
content. 

Another strand of vocabulary research relates vocabulary acquisition to clas- 
sification activities. The readers may classify corrcepts (Klausmeier. 1974) or 
may enhance their grasp of a given concept by allowing contexts to build up 
around a wordi O'Rourke (1974) s^Jes vocabulary development as a process 
of acquiring structures that allow readers to see words as classified components 
of a synergistic whole. Good readers classify and order concepts (vocabulary). 
Dillon (1976) documents the continued development and refinement of word 
classifications across the schoo! grades, and Evanechko and Maguire (1972) 
add that a substantia! change in the organization of word categories occurs as 
readers mature. It appears that vocabulary development is an integral part of 
concept acquisition and associated classifications, intellectual operations inti- 
mately related to thinking and comprehending. 

A position that has been sustained over lime holds that vocabulary is a 
powerful contributing factor to comprehension (Davis, 1944; Thomdike, 1973). 
This relationship was explored as a causal one by Yap (1979). He analyzed a 
set of reading data from second- and third-grade children on the hypothesis 
that vocabulary and comprehension wero causally related and that vocabulary 
was the predominant causal factor. His five procedures for conducting causal 
analysis yielded considerable convergent validity. The data supported the hy- 
pothesis that vocabulary causes comprehension— a significant contribution to 
this position. 

Evaluating Compositions 

The study reported in this paper used an evaluation scale to estimate the 
communicativeness of each composition as a way of relating vocabulary pro- 
duction to the primary purpose for writing— communication. 

How to evaluate compositions remains the aUical stumbling block in research. 
All the methods in current use can be divided into two basic Types: holistic and 
atomistic. "Holistic methods treat the composition as a whole entity, whereas 
atomistic procedures analyze particular aspects of writing. 

Holistic Evaluation 

Holistic evaluation methods look at compositions as whole messages and 
rate them quickly and impressionistically. The rater is not interested in counting 
features of the writing and makes no corrections or tabulations of errors. There 
are three basic kinds of holistic procedures: 

1 . Methods that match the student composition with other essays in a grad- 
uated series. 

2. Methods that assign point values for various general charateristics and 
derive a total score. 

3. Methods that assign an overall letter or number based on a predetermined 
scale. 
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Cooper (1 977) enumerates six specific holistic evaluation scales, all variations 
of the throe basic Types: 

1. Essay Scale: A series of sample essays is offered to which raters match 
the student compositioi\ to the sample most like it. Cooper claims this is 
probably the least reliable of the holistic methods. 

2. Analytic Scale: lists characteristics and a range of scores (e.g., 1 = low, 
2 = average, 3 =high) for each quality. The kir>ds of characteristics typically 
will include features like crganization of composition, theme, vocabulary 
diversity, etc. When detailed explanation of the characteristics is provided 
and the raters are trained and practiced Cooper believes this is the most 
reliable system. 

3. Dichotomous Scale: a series of characteristics or questions about the 
composition to which the rater answers "yes" or "no." The total of "yes" 
answers is the rating for the cosnposition. Cooper provides no 'reliability 
data for this kind of system, but believes it probably is not a very acurate 
measure. 

4. Feature Analysis: focuses on structural and stylistic characteristics of a 
composition. The rating scale lists qualities related to this one feature. An 
example of this would be if a researcher wanted to study the style of 
children's nan-atives, the scale might include characteristics like emotional 
quality, imagery and vocabulary variety. This has the same strengths of 
other analytic scales, but is used only for specific kinds of stuoies. 

5. Primary Trait Scoring: focuses on traits of a composition that are especially 
related to the writing task at hand, and not for composition in general. For 
example, the researchers may wish to see how children perfomi on so' hi 
correspondence. A task of invitation writing is set up and scored by noting 
whether the date and place of the party was included correctly, whether 
a pleasant tone waj> set in the letter, and so forth. With specific criteria 
for rating, high reliability can be achieved. 

6. General Impression Marking: a scale of usually five to seven increments. 
The rater assigns an over-all mark from the scale to tht; n^per. This is 
the simplest *of all systems, but requires much rater training and practice 
before acceptable reliability is achieved. 

Another devotee of holistic evaluation, Hillerich (1970), points to his research . 
to support non-syntactic evaluation. He shows why he feels classroom practice 
and research evaluation of student writing should focus on clarity and interest 
appeal. He has devised a 6-criteria scale with 1 through 5 points assigned on 
the qualities: unit of thought, logical order of development, smooth transition, 
vocabulary variety, sentence variety, and vividness and appropriateness of 
expression. To this he adds a 9-point mechanical checklist for a general idea 
about the technical quality of the writing. 

Because on the surfac^ holistic evaluation systems appear less "scientific" 
than atomistic methods, many researchers have addressed the issue of de- 
termining how holistic procedures compare to atomistic- Hogan and Mishler 
(1980) have made comparative studies using both atomistic and holistic meth- 
ods. They found acceptable reliab;iity for either method: .95 reliability coefficient 
for holistic methods; .81 to .95 range for atomistic. Their conclusion is that since 
both methods are about equally effective, one might as well select a holistic 
method because it is easier and less time-consuming to administer. 
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Howerton, et al. (1977) siudied compositions of 983 subjects in grades 4, 6. 
9, and 12. The compositions were rated both quantitatively and qualitatively. 
The atomistic measures included language productivity, vocabulary diversity* 
spelling and syntactic maturity. The general quality was judged by the ETS 
Composition Evaluation Scale. The results demonstrated that quantity and qual- 
ity are significantly related. Particularly the following atomistic features are 
related to overall quality: total number of words, total number of sentences, 
percent of unique words, and number of words per T-unit. 

Judine and Griffin (1970) found much the same thing. In a study with 269 
seventh, ninth and eleventh graders they discovered that overall quality of 
compositions was highly con-elated to the number of total words, total number 
of T-units and the total number of clauses. 

Ultimately, the quality of holistic evaluation measures rests not with how they 
compare to atomistic methods but with the quality and training of the raters. 
They are, of course, the key to the success of holistic evaluation. Experts agree 
that inter-rater reliability can be in the high .80s and low .90s consistently if 
raters are trained and have samples representative of the criteria. Cooper (1 977) 
suggests that nearly perfect confidence can be achieved if raters have similar 
backgrounds and if each student writing sample is rated by more than one 
rater. Diederich (1974) shows that untrained raters have a .31 reliability. But 
with practice, reliability rises dramatically. He goes further to recommend a five- 
point scale: 1 =poorest 5% of papers; 2=20% below average; 3=50% middle 
or average papers; 4--40% above average papers; and 5= the top 5% of the 
papers. Because the most inten-ater disagreement is found in the middle range. 
Diederich promises that this scale will render the best results beause it lumps 
the vague "average" papers into just one category. 

Procedures 
for Collecting and Compiling 
Children's Compositions 

In the fall of 1979, inquiries were mailed to several hundred school systems 
in the continental United States asking them to participate in a study of children's 
compositions. In the cover letter, the recipient was infonmed that the primary 
purpose of the study was to examine vocabulary pattems of children in grades 
1 through 8. The schools were also informed that the compositions would be 
used to conduct an analysis of other features of children's writing such as 
organization, theme, sense of punctuation, spelling, capitalization, etc. 

This procedure provided a quick sampling of potential school system pati- 
cipants. Each participating school system identified a central contact person 
who would distribute information and make local an-angements. 

A packet sent to each contact person contained five items: a cover letter 
describing the purpose of the study and the type of participation requested, a 
set of instructions to teachers, sample instructions for students, example stimuli 
for student compositions, and a postal reply card. As an inducement to partic- 
ipate, schools were offered a computerized analysis of vocabulary generated 
by children in that school system. 
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When a postal carc/ was returned, indicating a school's continuing willingness 
to participate in the study, a package of materials was mailed. The package 
included a (etter to the teacher, i^istruction sheets, student data sheets, and 
mailing labels for returr) of the compositions. Enough copies of the items were 
sent to provide one set for i 'ch participating classroom. 

In January, 1980, a follow-up letter was sent to any school system that had 
not responded to the original request. If no response resulted from this second 
request, the school was dropped from the mailing list. 

instructions To Teachers. 

Teachers were asked to fill out an information sheet for their class. The 
information sheet asked for limited demographic data on each of the students 
and the teacher's subjective rating of the student's reading ability. The teacher 
was infonned that the data were to be used to sud in analyzing the compositions 
and that all infomiation would be heid in confidence, only an identification 
number would be used and its conrespondence to the student's name would 
be destroyed as soon as the composition had been coded and archived. 

Teachers were instructed to ask children to write a composition and to give 
them 45 minutes to an hour to complete the task. Children were to be en- 
couraged to write about things related to their own lives, especially what they 
were learning in school, or the people and ideas that they often thought about. 
On a separate sheet, sample stimuli for compositions were provided along with 
the suggestion that the teacher might use one or more as needed to stimulate 
the children's writing. 

As children wrote, they were to do so without consulting friends or teachers. 
Students were to be informed that this was not a test, so they were to spell, 
punctuate, etc. as they thought those items should appear. Students were 
encouraged to write deariy and legibly since the compositions were to be read 
and typed into a computer for analysis. 

As soon as the compositions were completed, they were to be packaged 
securely and returned to the investigators. In all, more than 15,000 papers were 
received representing widely scattered areas of the United States. Of these, 
more than 4,000 were included in their entirety for the analysis reported herein. 
The result was a corpus of neariy one-half million running words (Tokens) which 
comprise the list described in this study. 

Preparation for Analysis. 

The student compositions and demographic data were entered directly into 
permanent files on the Indiana University Wrubel Computing Center system via 
interactive temiinals. Demographic data and a qualitative holistic rating (de- 
scribed elsewhere in this report) were entered first. Eight demographic de- 
scriptors included for each child: 

1 Identification code. 

a. school system number: 01 to 99. 

b. teacher number, within school: 01 to 99. 

c. student numliier, within teacher: 01 to 99. 

2 Sex of student. 

3 Ethnicity. 

4 Grade level. 
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5 Teacher's rating of general reading ability. 

6 Bilingual status of the student. 

7 Standard Metropolitan Statistical Area (SMSA) rating of the locale of the 
school system. 

8 Communicativeness rating of the composition. 

The actual composition was entered directly into computer storage as ver- 
batim text with spelling en-ors noted and transfomned. Capitalization, punctua- 
tion and grammar were not altered. Spelling errors were entered in the following 
fashion : immediately following a spelling error, an asterisk or star (*) was printed. 
This notation "alerted" the analytic program that the configuration was a mis- 
spelled word. The starred misspelling was followed by the corrected f jelling 
enclosed in plus ( + ) signs. For consensus of spelling, all possible misspellings 
were compared to the listed spelling in the Macmillan School Dictionary (1977). 
For example, if a child wrote; Dont forget yor luch mony. 
the entered text would be: 

Dont* +don't+ forget yor* +your+ luch* 
+ lunch + mony* + money +. 

Note that the period at the end of the sentence follows the correction of the 
last word, it did not follow the uncorrected form. 

Raters were also infonmed net to tamper with grammar as long as the word 
was spelled correctly, that is, if the basic word was a valid word but was used 
in a manner that might not correspond to standard English, it was not altered. 
Raters were also given the following guidelines: 

Verb forms. Do not make a correction if the basic word is spelled correctly but 
is wrong in tense or number (e.g., "I see him yesterday." or "The man are not 
here.") If the student attempts to use a contraction but omits the apostrophe, 
this should be marked as an enor in spelling (e. g., "I dont know." would become 
"I dont* + don't + knew.") If the verb is spelled inconectly. then make your 
correction in tenms of the verb form the student seemed to be trying to use: 
this may be difficult to detemnine in early grades. If the student has written -I 
sea him yesterday." then use the corrected spelling "see" without altering the 
tense of the word. If, however, the student attempts something resembling past 
tense (at least phonetically) then take this into account in your corrections (e.g.. 
•'I sow him" or "I seed him" becomes "I saw him" or "The balloon popt" should 
be coded as "The balloon popped."or "I mnned" would be "I ran.") 

Noun endings. If the word is spelled conectly but lacks the proper ending 
(e.g., the singular is used where the plural is intended), do not mark this as an 
error (e.g., "All the boy are here.") Possessive endings should be marked as 
errors if the student uses an "s" without the apostrophe ("The boys book"), 
should be coded "The bo/s book").) If the "s" is not used, do not add it (e. g.. 
"The boy book" should not be corrected). If the noun is spelled incorrectly or 
is a non-word, then it should be conected (e. g., "The bok is on the tabel" or 
"All the mans are here.") Sometimes a noun or pronoun used is so far off that 
a basic change has to be made in the form during correction (e. g.. "He or she 
should clean up hes or shes room" cannot be Improved with apostrophes and 
should be altered to "his or her"). 

Adjectives used where an advert> Is required. If ttie "ly" ending is omitted 
but the word is spelled correctly, do not mark an error (e.g., "Your work should 
be done neat"). 
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Contextual errors. In any case which does not clearly fall into one of tho 
categories mentioned above, a decision must be made based on context and 
meaning and the assumed word the student intended to use. All misspellings 
of this type should be corrected. Often it is necessary to sound out a word so 
that a phonetic approximation can be translated into a real word 
(-clous" = "clothes". "wode'*= "would", etc.). Even if the student uses a word 
which is a legitimate word in its own right, but is wrong in context, it should be 
inarked as a spelling error (e. g.. "Timmy want to the circus" or "They came 
to are house.**) 

As noted above, raters and recorders of the passages all us<?d the same 
dictionary. In all error cases, the first spelling was used as the correct spelling. 
If. however, a student used a secondary spelling, it was rated as acceptable. 
* In cases where the string of letters was unrecognizable or was perhaps a 
random string of letters (one such string was 35 letters long) the configuration 
was cod^ with a double asterisk (") immediately following the non-wcrd. The 
analytic program was alerted to ignore the configuration. Proper nouns were 
similarly coded with double asterisks to eliminate them from the lexicon. In 
retrospect, this was not the best decision since it may have been valuable to 
code non-words and proper nouns differently. 

In cases where a compound word was written as two separate words (e. g., 
"Beat Nik"), the misspelling was enclosed in brackets and the correction was 
enclosed in the standard plus signs. 

The compositions were entered into the computer via text mode over an 
interactive terminal. The compositions were entered as continuous text in the 
last 72 columns of a given line of text file. The first 8 columns contained the 
composition identification code and a line sequence number. The last word in 
the composition was followed by a double colon (::). This signified the end of 
the student's composition. If the identification code changed before the double 
colon was read, an error message was registered by the computer. 

Presented below is a sample composition as it was coded: 
050103 15120 2063 

Hekold* + called + a haled* + helicopter + he got in the haled* + helicopter + 
he sed* +said+ will you be mi* +my+ fed* + friend + Tucdan** and he sed* 
+said+ bi* +bye+ to hisfenz* +friends+ Ised* +said+ gbiy* +good-by+ 
to* +too+:: 

Rating the Compositior-. 

Prior to its being entered into permanent computer storage, each composition 
was rated on a scale from 1 to 5 indicating the relative value of communica- 
tiveness. The progressive scale, adapted from Smith (1967), attempts to quan- 
tify the qualitative differences in children ability to use language to convey a 
complete message. As such, it is a holistic rating. The score is not a measure 
of spelling or grammar although it is likely that those measures are correlated. 
The skills implied in the rating include (a) a clear conception of words as units 
of a sentence; (b) an arrangement of words in subject-predicate-object order 
to form normal sentence patterns or complete utterances; (c)the ability to mark 
transitions through connectives, such as and, but, because, then, etc.; (d) 
having in mind an overview of a message :hat should be completed. Given 
these attributes, each composition was then rated as follows: 
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1— No communication related to the topic or task beyond one simple utterance 

(sentence). 

2— Communication related.to the topic involving several sentences (or equiv- 

alent utterances such as run-on clauses) but lacking evident organi- 
zation. 

3— Communication related to the topic Involving several cohesive (organized) 

sentences but lacking completeness (too brief) or are confusing. 

4— Communication related to the topic involving a number of cohesive sen- 

tences. Communication moves from a beginning, develops the idea 
and concludes. A complete and organized atatement. 

5— Same as number 4 plus originality of expression or idea, or a polished 

writing style that adds clarity and personality to the communication. 

Since first-grade children are only beginning to acquire these skills, the scale 
was altered somewhat to reflect their level of experience: 

1— No communteation at the sentence level retaled to the story. 

2— Only one idea at the sentence level related to the story. 

3— Uore than one idea communfcated at the.sentence level, but the story is 

incomplete or confused or difficult to read. 

4— Three or more ideas or events are given in sequence, reaching a satisfactory 

conclusion, although there may be minor lapses in story or in written 
expression. 

5— Competent (as in 4) plus a variety or sophistication of sentence structure, 

or originality in bringing about a conclusion. 

Raters were admonished to bear in mind that spelling, grammar, and neat- 
ness were not to be directly considered in this general measure of communi- 
cation. These were described as elements that contribute to effective com- 
munication, but their perfection is not needed to communicate well. 

To merit a rating of 4 or 5 the child must have demonstrated that he knows 
how to frame a sentence with a capital and a period, even though his com- 
position may have been one long run-on sentence. To get a rating of 4 he 
should have shown that he has captured the basic idea that written language 
is expressed in the sentence frame, although he may not have applied the 
sentence frame to every main subject-predicate-object unit he wrote. 

To merit a rating of 5 the child must have used a sentence frame more than 
once. But again, complete accuracy could not be expected from children whose 
knowledge of punctuating sentences may have been acquired primarily through 
an inductive process. 

Computerized Analysis. 

Once the compositions were prepared, rated, and archived onto permanent 
9-track magnetic tape, they were subjected to a sequential set of FORTRAN 
programs written specifically for this project. The first of the programs trans- 
fonned the child's composition into a serial alphabetized list of words that 
appeared in the composition together with coordinate spelling enors where they 
occuned. The program also generated a data record for each child, which 
included the demographic data described above, the rating of the composition, 
the total number of words (Tokens), the number of differeht words (Types), the 
total number of spelling errors, and the number of different spelling errors 
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generated by the child. Those data records form the basis of several of the 
analyses which follow. 

Each of the serial alphabetized children's lists were then merged into a by- 
school» word-by-grade matrix containing word frequerwrfes in the cells. This by- 
school vocabulary list was then sent to the participating school. Subsequently 
by-school lists were then merged into the master Hst. .A complete (300 page) 
listing is available from the authors. A listing of the 500 most frequent words 
is attached as Appendix A. 

A ca5«parable pattem of merging student lists into by-school lists which were 
in turn merged into a master list was followed in the generation of a list of 
spelling en-ors in the children's compositions. That master list oi spelling en-ors 
(a 200 page listing) is available from the authors. A listing of the most frequent 
spelling enors is attached as Appendix B. 

The master word list was also merged with a selected set of other lists to 
provide a basis for analysis of comparability. The lists included the Rinsland 
(1 945) list, the lOiva (Horn. 1 926) list and the DiVesta and Walls (1 970) children's 
semantic differential ratings of selected words. Word lists from seven basal 
reading and spelling series were also merged (Ginn Reading. Scott Foresman 
Reading. Macmillan Reading, Houghton Mifflin Reading. Holt. Rinehart and 
Winston Reading. Scott Foresman Spelling, and McGraw-Hill Spelling). This 
super list showing the relationship amon& 1 1 different word lists is available on 
a selective basis from the authors. 

Qualitative Analyses 

The master list was matched against other lists to examine the number of 
differences among the 5.000 most frequently used words* Thus cultural shifts 
and other qualitative differences could be noted across the time period from 
Horn (1926) to Rinsland (1945) to the present study. Quantitative differences 
are also interesting in light of the significance placed on word lists in curriculum 
development activities. 

Analyses of The Pupils' 
Written Vocabulary List 

In all. pupils in this sample produced a corpus of 482.487 running words 
(Tokens) which were sorted into a list of 10.2G5 different words (Types). As in 
previous studies, a few Types accounted for a large portion of language usage. 
In studies by Cook and O'Shea (1914), Kucera and Francis (1967). Carroll et 
al.. (1 971 ) and Hillerich (1 978), for example, the first five or ten Types accounted 
for upwards to 26 % of the entire corpus of words. Hillerich (1978) found that 
the first five most frequent words in a study of children's vocabulary accounted 
for 18.2 % of the entire corpus of written compositions. 
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TABLE 1 



Rank 




CP 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 



A 

THE 
AND 



39.599 
22,599 
20,496 
12.846 
11,771 
6.673 
6.363 
5.855 
5.514 
5,435 



39.599 
61.934 
82.430 
95.276 
107.047 
113.720 
120.083 
125.938 
131,452 
136.887 



0.082 
0.046 
0.043 
0.027 
0.024 
0.014 
0.013 
0.012 
0.011 
0.011 



0.082 
0.128 
0.171 
0.198 
0.222 
0.236 
0.249 
0.261 
0.271 
0.284 



TO 

WAS 

MY 

OF 

WE 

HE 



In the same fashion, a few Types dominated children's writing in this study. 
Table 1 presents the 10 most frequent Types in the resulting list in descending 
order of their frequencies (f). Also presented are cumulative frequencies (cf). 
proportion of all Tokens (p) and cumulative proportion of all Tokens (cp). It may 
be seen in that table that the first five Types accounted for 22.2 % of the entire 
corpus. The second five Types only accounted for an additional 6.2 % of the 
Tokens. (The first 500 most frequent words are presented in descending order 
of frequency as Appendix A.) By way of contrast, at the other extreme. 3.550 
Types had a frequency of 1 . Hence, in the low frequency end of the distribution, 
34.6 % of all Types accounted for less than 1 .0 % (0.7 %) of the Tokens. When 
Types with a frequency of 2 are included, another 1.531 Types are added. In 
so doing, roughly 50 % of the Types provide a combined contribution of only 
1.4 % of the entire corpus of Tokens. 

Distribution Density 

Kucera and Francis (1967) and Herdan (1960) have noted that an additional 
index of the density of cases within a few Types may be gained through the 
use of Yule's K. K provides a value linked to distribution values and increases 
as iiie distribution becomes increasingly skewed (Yule, 1944). In those Type- 
Token distributions that display a heavy dependence on a few Types, values 
of K are elevated. In Kucera and Francis's analysis of the distribution density 
of the Brown University corpus of adult reading material, for example, they 
reported a value of K = 98.7077 which they took to indicate a very heavy 
density. That is, most adult reading material was built around a minority of 
Types. Previous studies which report a higher concentration of common words 
in children's productive vocabulary, would indicate that higher values of K would 
Indicate result Uom a corpus of children's writing. 

Table 2 presents a set of distribution statistics for the lists by grades 1 to 8 
and for the entire corpus. The values of K range from a low of 142.728 at the 
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second grade to a high of 160.568 at the fourth grade. There was no regular 
shift in values of K across grade levels. Clearly, however, the density distribution 
was niore extreme in the children's compositions than in adult reading matter. 
Whether an analysis of adult written perfonnance would yield greater or lesser 
values of K is unclear at this point, however, one would suspect a reduction of - 
those values with improved lexical flexibility. 

TABLE 2 



Gr 


' r 

Mean 


Distribution Statistics of the List by 
Grade and by the Overall List 

Standard Skew Kurtosis Yule's 
Deviation K 


Types 


7o/fe/7S 


1 


10.323 


36.756 


10.993 


152.087 


160.121 


848 


8754 


2 


13.508 


67.992 


16.644 


340.871 


142.728 


1840 


24855 


3 


16.229 


105.816 


22.349 


612.380 


159.982 


2716 


44078 


4 


18.495 


143.613 


27.331 


944.123 


160.568 


3814 


70540 


5 


17.633 


142.867 


28.593 


1044.893 


151.479 


4396 


77513 


6 


18.360 


156.980 


31.544 


1285.034 


147.657 


5015 


92076 


7 


17.031 


151.829 


37.008 


1788.169 


155.4k4 


5174 


88120 


8 


15.359 


135.054 


29.947 


1092.967 


157.010 


4984 


"76551 




47.003 


574.298 


45.005 


2618.645 


146.387 


10265 


482487 



Analyses 
of Children's Compositions 

in a study of predictive validity between multiple choice tests and holistic 
ratings of children*s compositions, Hogan and Mishler (1980) found correlations 
of 0.65 for third graders and 0.68 for eighth graders. Ljndell (1980), reporting 
on a Swedish study of children's writing behavior, noted that the single best 
predictor of a holistic score was the number of Types (different words) in the 
composition. Further, other mechanistic attributes of the composition such as 
punctuation, the number of lines, the number of Tokens (total words), different 
adverbs were also significant predictors of holistic values assigned to Swedish 
children's compositions. 

Token Production 

students m this sample wrote compositions containing an average of 102.03 
Tokens per composition with .a standard deviation of 62.70 Tokens. On the 
average, girls produced more Tokens per composition {M = 109.10) than did 
boys = 94.72) and the advantage was stable across grades. Figure 1 
presents mean token and type productions by composition for boys and girls 
across the eight grade levels. Only at grades 1 and 2 was some ambiguity 
regarding sex differences in token production seen. However, no statistically 
significant interaction of sex and grade was yielded for the data. As would be 
expected, children produced more Tokens per composition as they progress 
through the first eight grades of schooling (p< .001). 
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Further, increased token production at each grade was higher among those 
compositions rated higher in communicativeness (p< .001). There was, how- 
ever, a statistically significant interaction of the effects of grade and holistic 
rating on token production. The grade by rating contribution to token production 
may be seen in Rgure 2. Note in thst figure that while higher holistic ratings 
were linked to higher token production at all grade levels, the advantage became 
more marked as the grade level increased. 
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FIGURE 2 

TOKEN PRODUCTION BY RATING AND GRADE 




GRADE LEVEL 

Type Production 

students in the study produced an average or 59.25 Types por composition 
with a standard deviation of 31.87 Types. As vAlh token production, girls v^ere 
more likely to produce more Types per composition {M = 62.34) than boys (M 
= 56.04), and type production increased with grade levef (p< .001). Type 
production for boys and girls over the eight grade levels may be viewed in 
Figure 1. Further, as with token production, children whose compositions were 
rated as higher, more communicative, had more Types per composition (p< 
.001 ). However, unlike token production, there was not a holistic rating by grade- 
level in'iferaction. The nature of the relationship between holistic rating, grade 
level and type production may be seen in Figure 3. 
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GRADE LEVEL 



Type-Token Relationships 

In early studies of the relationship of type to token production in children s 
language, a regular decline in the magnitude of the ratio of Types to Tokens 
was observed as children matured (Chotlos, 1944; Johnson, 1944). While the 
type-token ratio reflects a degree of diversity of language, the volume of a 
child's verbal output is directly related to the reliability of the ratio, i.e., as volume 
of output increases, so does the reliability of type-token relationships (Chotlos. 
1944). Since the volume of output increased across grade levels, there is a 
reasonable expectation that reliability suffered among ratios for younger chil- 
dren. Further, as the correlation between the number of Types and Tokens 
increases the usefulness of the ratio in correlational studies becomes suspect. 
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RGURE 4 

TYPE-TOKEN RAliOS BY RATING AND GRADE 



1.00 




Carroll (1968) cautions that a decrease in type-token ratios across age groups 
may bo artifactual because of differing base rates of Tokens. He suggests a 
modification of the simple type-token relatfonship by dividing the number of 
Types by the square root of two times the number of Tokens. While the latter 
transformation may be mathematically more satisfactory, it fails to yield a psy- 
chologically interpretable result. Hence, the cautions defined above nolhwith- 
standing, the following results were observed between type and token produc- 
tion. , _ . 

There was a strong linear relationship between the numoer of Types and 
Tokens produced by children in the study. In assessing the magnitude of the 
relationship of the unadjusted values of the two, the squared correlation was 
0 922 There was a slight nonlinear effect. Following the lead of Chotlos (1944) 
and Carroll (1968; Carroll, et al.. 1971) the type and token values were trans- 
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formed into logarithmic equivalents. The squared correlation between the log- 
arithmically transformed type and token productions was 0.933. Cleariy there 
was tittle room for growth in the zero*order relationship but the increase resulting 
from the transformation was significant. The regression line resulting from that 
analysis was 

Log (Types) • = 0.8489 * Log (Tokens) + 0.1597 



This function is similar to that reported by Choltos (1944), although the mag- 
nitude of the function in this study was substantially higher. When viewing the 
same relationship within subgroups defined by holistic ratings of the compo- 
sitions» the log-linear slopes were roughly parallel and the intercepts increased 
regulariy with the holistic rating. 

Rnally, Figure 4 presents the mean type-token ratios for children in the study 
by grade and by holistic rating. As with previous analyses (see especially. 

FIGURE 5 

SPELLING ERRORS BY SEX AND GRADE 




GRADE LEVEL 
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Chotlos, 1944), the magnitude of the ratio diminishes across grades and, to a 
lesser extent, across holistic raKngs. 

Spelling Error Production 

students in the study produced a corpus of 10,473 different spelling errors. 
The average numl)er of spelling en'ors in a given compositiori was 6.89 with a 
standard deviation of 6.91 spelling en'ors. The distribution, as may t}e inferred 
from the above listed data, was positively skewed. The numl)er of spelling 
errors per composition ranged from 0 to 1 10. Girls produced more (M= 7.37) 
than boys {M^ 6.42) and number of spelling on^ors increased initially then 
diminished with grade level (P< .001). This relationship may be seen in Figure 
5. 

Curiously, children whose compositions Vifere rated higher in communication 
value produced more spelling en^ors than those whose compositions were rated 
as lower in communicativeness {P< .001). Tliis, and the apparent disadvantage 
seen among girls may result from the fact that higher-rated compositions and 
compositions written by girls had more Tokens. The larger compositions pro- 
vided greater opportunity for spelling en^ors. A more appropriate measure would 
thus be a ratio of ^'le number of spelling errors to the total number of Tokens. 
When this ratio was created the results were tTiore in accord with expectations. 
Higher-rated compositicns had lower spelling errors to Token ratios th^nlheir 
low-rated counterparts. An exception was seen among first-graders. However, 
the instability of first grade is panially artifactual since a large proportion of first- 
grade compositions that were rated as *'1 " had one or two words that may have 
been spelled con^ectly but communicated little. The ratio in those cases would 
be 0.00. The distributi^-^ of the spelling error proportion for compositions of 
differing ratings across grades is seen in Rgure 6. 

Wing and Baddeley (1980) recently noted the need for a corpus of spelling 
en^ors and analyses of such a corpus. They present analyses of spelling eaors 
generated from 40 college students' essays. On the average, college students 
generated about 1.5 spelling errors per 100 words. In comparison, the rate for 
eighth-grade children in this study was about 3.5 per hundred words and about 
25.0 per hundred words for first-grade children. There was an apparent r>on- 
linear decline in en'or rate from grades 1 to 8. This non-linear function may be 
viewed from its logarithmic function. The resulting log-linear relationship yields 
an = .209; the regression line for that relationship is 

log (Error Ratio) = -.960 log (Gsade) -.574 

If the regression line is projected out to the sophomore year (Grade = 14), 
the projected error rate is 0.5 per 100 with a one standard deviation confidence 
interval of 0 and 1.8, which includes the Wing and Baddeley value. 

A limited listing of highly frequent spelling errors is included as Appendix B. 
The listing is organized by a given word, its common misspelling, and the fate 
of occurence at each grade level. One shcuki be careful, however, not to 
assume that these are representative of the entire range of en^ors. The entire 
list of 10,473 different spelling enters b available from the authors. 
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FIGURS6 

SPELLING ERROR*TOKEN RATIOS BY RATING AND GRADE 
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GRADE LEVEL 



Vocabulary Changes Over Time 

The volume of word production is, of course, a function of several factors, 
especially a pupil's available vocabulary and practice in writing. The powerful 
effect of vocabulary on reading comprehension (more than 50% of the variance) 
has been demonstrated by Davis (1944). Thomdike (1973-74), and Yap (1979). 
This current composition study shows a similar effect on pupil's writing. The 
difference in volume anxl in the number of unique words produced can be 
illustrated with samples taken from first grade pi 'oils. 

Two pupils who scored in the bottom quarter in both reading and written 
communication wrote the following: 

Pupil 1: his dad was not happy. 
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Pupil 2: I went to grandmas house. 

Two first grade pupils who scored in the top quarter In both reading and 
written communication wrote the following: 

Pupil 3: I went roller-skating with Jean and Dawn. 

It was fun. There was music. We did the bugie wugie too. My mom 

can't ice skate so you can imagine what she kx)ks like on wheels. 
Pupil 4: Meredith is my best friend. She hits me alot but thals OK we still get 

along. Sometimes we play baseball socker and football. Sometimes 

we swim in her blue sement swimming pool 
There are, of course, other factors operating in these examples besides volume 
of output and the number of unique words. Those factors, however, are not the 
focus of this paper. 

Several comparisons can be made between the vocabulary list generated 
by the cun-ent sample and lists from earlier studies. Most applicable is the 
Rinsland study (1945) whidi also generated a vocabulary from pupils* writing, 
grades one to eight. Some of the differences can be attributed to the cultural 
shifts that have occunred over the past four years. Sample changes are illus- 
trated below. 

CULTURAL 

There is a change in the vocabulary that children use, e.g., cultural changes: 
an ill-mannered person isn't crude (1945), helshelit (you guessed it, gender is 
no longer used) is gross (1981). 

And it's not because of people's ignorance (1945); it's because of their 
sft/p/cf/fy-{198U 

Sorne^ople^ere a nuisance; now they're a pest. 

-Children wouTdn't be caught dead in trousers; you gotta wear/eans. 

Thasermons at churches must be losing their fire: ministers, not preacliers, 
are giving them today. 

It looks as if the following words might be here to stay for a while: 



all'Star 


double-deckers 


madder 


back-up 


four-seater 


minibike 


beeper 


fro 


neater 


between-meal 


glob 


neatest 


bionic 


goof 


plop 


blast-off 


gooey 


pro 


blob 


gory 


runt 


brand-new 


guts 


sissy 


built-in 


guzzlers 


slop 


burp 


he/she (as a unit) 


snuck 


chalk-talk 


hi 


ten-speed 


chugging 


him/her/it (variant 


ugh 


clobber 


of he/she/it) 


weirdest 


conk 


hogging 


whoopee 


cooped 


jerics 


wow 


disco 


licks 





Still think times haven't changed? Elementary school children are using the 
. following words in their writing today: abortion, acid, addicted, assassinate, 
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boitality, bugging, busing, corruption, dope, downs, drugs, euthanasia, fuel- 
efficient, ghetto, hassle, heroin, holocaust, junkie, live-in, long-hairs, magnum, 
marijuana, nwlested. mugged, no smoking, nuclear, overpopulated. overdose, 
pollutton, panic-stricken, power-iTiad, prefabricated, push-button, radiation, rape, 
rerun, riots, robot-like, sex, sexy, sixpack, slums, smog, terrorized, tranquilizers, 
wariieads, and zillionaire. None of these words appeared in children's writing 
in the forties. 



AUDIENCE 

Two other comparisons reflect the qualitative shifts represented by this vo- 
cabulary study. Between the first 5,000 most frequently used words in the 
current sample and the first 5,000 in the Rinsland sample there are 1,758 
different words. A similar change can be noted when the current sample is 
compared to the Hom list (1928), a vocabulary list often used by publishers of 
childrens books, a list generated from writing for adults. A difference of 1 ,915 
words occurs between the first 5,000 on the Hom list and the first 5,000 most 
frequently used words in the cun'ent sample. Appendix A contains the 500 most 
frequently used words in the cun'ent study. The complete word list, a 350-page 
document is available from the authors. 



Conclusions 

Beyond the specific analyses and results in this monograph, the data from 
this study serve as an important resource for future research- First, the lexicon 
that results from the children's compositions provides a current list on which 
future evaluative studies of children's writing may be based. Beyond offering 
a broad template against which to assess the range and variety of children's 
writing at the elementary levels; it sets the stage for the establishment of criteria 
for assessing individual productivity. Second, the merged **super list" provides 
a capability of comparing current writing behavior to other templates. Third, the 
generation of the compendium of spelling errors for such a broad sample of 
children's writing behavior may be unique. Insofar as our literature search has 
yet revealed, no other such compendium exists over the past 50 years. It is 
clearly an important resource for future spelling research. Fourth, the availability 
of data on individual compositions provides a base on which other normative 
and descriptive research may be based. Fifth, the availability of the actual 
compositions permits future quantitative and qualitative analyses beyond those 
described in this monograph. For example, future researchers might study 
organizational patterns, ethnic and regional characteristics, and so on. Sixth, 
the availability of both quantitative and qualitative data on children's compo- 
sitions provides a base upon which the relationship of mechanistic and holistic 
evaluations may be detemnined. 

The analyses reported in this monograph tend to confirm and to extend results 
of previous research in children's writing. Qualitative and quantitative trends 
were observed related to maturity and ability. Further, a general picture of 
children's writing behavior emerged. 
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In some sense, the picture of current writing performance is less than en- 
couraging In both quantitative and qualitative analyses, the compositions were 
disappointirig. Quantitatively, most written output was msde up of a very small 
sutjset of Types. Qualitatively, the compositions lacked spontaneity or origi- 
nality By and large, they were Ixwng and banal In the judgment of the raters. 

Nonetheless, several useful characteristics of chMdren's writing may be gleaned 
from these analyses. First, simple quanthy of output and quantity of Types 
output increased reoulariy over grade levels. v«th girts hokllng. an advantage 
in quantity of production. Further, as the holistic qualitative rating of the com- 
position increased, children were more Hkely to produce more Types and more 
Tokens and fewer spelling errors. School activity over time pays off in those 
measures. The more communicative composittons were notable by their quan- 
tity and breadth of output and their tower rate of mechanistfc errors. A word of 
caution about the latter relationship is in order. It is not totally certain that type 
and token production and spelling errors do not.contaminate raters' behavior. 
Greater quantity of output and low spelling errors may aeate a hato effect. 
Certainly previous writing research has indicated this to be so. Nonetheless, 
these analyses and others have indicated that type and token production are 
significant indicators of quality of output. If quality of compositkMi is seen to 
reflect general ability, then the results of these studies parallel those of Chotlos 
(1944) who found comparable differences in type and token production related 
to standard tests of intellectual ability. 

Whether diversity of output improves with grade level is less clear. Overall 
lexical diversity of language was noted across grades. In viewing type-token 
ratios some improvement was found across grade levels and across qualitative 
ratings of the compositions. Answers to queotions regarding diversity may thus 
be tied to the manner in which the data are analyzed. 

These data certainly indicate some devetopmental considerations for the 
elementary school curriculum, grades 1 through 8. ages 6 through 14. Volume 
and diversity of vocabulary production, major con-elates .to success in written 
communication and in reading comprehension, need to be promoted among 
students, especially those who are less successful in the early years. Spelling 
errors, related as they are to success in written communication, may be dealt 
with as specific words or as patterns of words which are needed in a given 
grade. These data indicate the ages at which students want to use certain 
words— with their misspellings— and thus enable curriculum developers a more 
secure sense of the likelihood of success in grade by grade spelling activities. 

Finally, this study reminds educators that measures of composition effec- 
tiveness may be both qualitative (holistic) and quantitative (mechanistic) with 
very high correlations between those types of measurement. The data from 
this study provide educators with several quantitative measures, in addition to 
a qualitative measure, as bases for examining compositions developmentally 
from ages six to fourteen. 
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